State Aggregation for Distributed Value Iteration in Dynamic Programming

نویسندگان

چکیده

We propose a distributed algorithm to solve dynamic programming problem with multiple agents, where each agent has only partial knowledge of the state transition probabilities and costs. provide consensus proofs for presented derive error bounds obtained value function respect what is considered as "true solution" from conventional iteration. To minimize communication overhead between costs are aggregated shared agents when updated expected influence solution other significantly. demonstrate efficacy proposed aggregation method large-scale urban traffic routing problem. Individual compute fastest route common access point share local congestion information allowing fully minimal agents.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Value Iteration with Options and State Aggregation

This paper presents a way of solving Markov Decision Processes that combines state abstraction and temporal abstraction. Specifically, we combine state aggregation with the options framework and demonstrate that they work well together and indeed it is only after one combines the two that the full benefit of each is realized. We introduce a hierarchical value iteration algorithm where we first ...

متن کامل

Performance Loss Bounds for Approximate Value Iteration with State Aggregation

We consider approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performance loss bounds for policies derived from approximations associated with fixed points. These bounds identify benefits to using invariant distributions of appropriate policies ...

متن کامل

Unifying Value Iteration, Advantage Learning, and Dynamic Policy Programming

Approximate dynamic programming algorithms, such as approximate value iteration, have been successfully applied to many complex reinforcement learning tasks, and a better approximate dynamic programming algorithm is expected to further extend the applicability of reinforcement learning to various tasks. In this paper we propose a new, robust dynamic programming algorithm that unifies value iter...

متن کامل

Aggregation in Stochastic Dynamic Programming

We present a general aggregation method applicable to all finite-horizon Markov decision problems. States of the MDP are aggregated into macro-states based on a pre-selected collection of “distinguished” states which serve as entry points into macro-states. The resulting macro-problem is also an MDP, whose solution approximates an optimal solution to the original problem. The aggregation scheme...

متن کامل

A New Value Iteration Method for the Average Cost Dynamic Programming Problem∗

We propose a new value iteration method for the classical average cost Markovian decision problem, under the assumption that all stationary policies are unichain and that, furthermore, there exists a state that is recurrent under all stationary policies. This method is motivated by a relation between the average cost problem and an associated stochastic shortest path problem. Contrary to the st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Control Systems Letters

سال: 2023

ISSN: ['2475-1456']

DOI: https://doi.org/10.1109/lcsys.2023.3285655